328 research outputs found

    ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment

    Full text link
    The objective of stylized speech-driven facial animation is to create animations that encapsulate specific emotional expressions. Existing methods often depend on pre-established emotional labels or facial expression templates, which may limit the necessary flexibility for accurately conveying user intent. In this research, we introduce a technique that enables the control of arbitrary styles by leveraging natural language as emotion prompts. This technique presents benefits in terms of both flexibility and user-friendliness. To realize this objective, we initially construct a Text-Expression Alignment Dataset (TEAD), wherein each facial expression is paired with several prompt-like descriptions.We propose an innovative automatic annotation method, supported by Large Language Models (LLMs), to expedite the dataset construction, thereby eliminating the substantial expense of manual annotation. Following this, we utilize TEAD to train a CLIP-based model, termed ExpCLIP, which encodes text and facial expressions into semantically aligned style embeddings. The embeddings are subsequently integrated into the facial animation generator to yield expressive and controllable facial animations. Given the limited diversity of facial emotions in existing speech-driven facial animation training data, we further introduce an effective Expression Prompt Augmentation (EPA) mechanism to enable the animation generator to support unprecedented richness in style control. Comprehensive experiments illustrate that our method accomplishes expressive facial animation generation and offers enhanced flexibility in effectively conveying the desired style

    DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection

    Full text link
    Graph Anomaly Detection (GAD) has recently become a hot research spot due to its practicability and theoretical value. Since GAD emphasizes the application and the rarity of anomalous samples, enriching the varieties of its datasets is a fundamental work. Thus, this paper present DGraph, a real-world dynamic graph in the finance domain. DGraph overcomes many limitations of current GAD datasets. It contains about 3M nodes, 4M dynamic edges, and 1M ground-truth nodes. We provide a comprehensive observation of DGraph, revealing that anomalous nodes and normal nodes generally have different structures, neighbor distribution, and temporal dynamics. Moreover, it suggests that those unlabeled nodes are also essential for detecting fraudsters. Furthermore, we conduct extensive experiments on DGraph. Observation and experiments demonstrate that DGraph is propulsive to advance GAD research and enable in-depth exploration of anomalous nodes.Comment: 9 page

    Prompt-based Node Feature Extractor for Few-shot Learning on Text-Attributed Graphs

    Full text link
    Text-attributed Graphs (TAGs) are commonly found in the real world, such as social networks and citation networks, and consist of nodes represented by textual descriptions. Currently, mainstream machine learning methods on TAGs involve a two-stage modeling approach: (1) unsupervised node feature extraction with pre-trained language models (PLMs); and (2) supervised learning using Graph Neural Networks (GNNs). However, we observe that these representations, which have undergone large-scale pre-training, do not significantly improve performance with a limited amount of training samples. The main issue is that existing methods have not effectively integrated information from the graph and downstream tasks simultaneously. In this paper, we propose a novel framework called G-Prompt, which combines a graph adapter and task-specific prompts to extract node features. First, G-Prompt introduces a learnable GNN layer (\emph{i.e.,} adaptor) at the end of PLMs, which is fine-tuned to better capture the masked tokens considering graph neighborhood information. After the adapter is trained, G-Prompt incorporates task-specific prompts to obtain \emph{interpretable} node representations for the downstream task. Our experiment results demonstrate that our proposed method outperforms current state-of-the-art (SOTA) methods on few-shot node classification. More importantly, in zero-shot settings, the G-Prompt embeddings can not only provide better task interpretability than vanilla PLMs but also achieve comparable performance with fully-supervised baselines.Comment: Under revie

    Integrated photonics modular arithmetic processor

    Full text link
    Integrated photonics computing has emerged as a promising approach to overcome the limitations of electronic processors in the post-Moore era, capitalizing on the superiority of photonic systems. However, present integrated photonics computing systems face challenges in achieving high-precision calculations, consequently limiting their potential applications, and their heavy reliance on analog-to-digital (AD) and digital-to-analog (DA) conversion interfaces undermines their performance. Here we propose an innovative photonic computing architecture featuring scalable calculation precision and a novel photonic conversion interface. By leveraging Residue Number System (RNS) theory, the high-precision calculation is decomposed into multiple low-precision modular arithmetic operations executed through optical phase manipulation. Those operations directly interact with the digital system via our proposed optical digital-to-phase converter (ODPC) and phase-to-digital converter (OPDC). Through experimental demonstrations, we showcase a calculation precision of 9 bits and verify the feasibility of the ODPC/OPDC photonic interface. This approach paves the path towards liberating photonic computing from the constraints imposed by limited precision and AD/DA converters.Comment: 23 pages, 9 figure

    Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

    Full text link
    Recent years have witnessed significant advancements in self-supervised learning (SSL) methods for speech-processing tasks. Various speech-based SSL models have been developed and present promising performance on a range of downstream tasks including speech recognition. However, existing speech-based SSL models face a common dilemma in terms of computational cost, which might hinder their potential application and in-depth academic research. To address this issue, we first analyze the computational cost of different modules during HuBERT pre-training and then introduce a stack of efficiency optimizations, which is named Fast-HuBERT in this paper. The proposed Fast-HuBERT can be trained in 1.1 days with 8 V100 GPUs on the Librispeech 960h benchmark, without performance degradation, resulting in a 5.2x speedup, compared to the original implementation. Moreover, we explore two well-studied techniques in the Fast-HuBERT and demonstrate consistent improvements as reported in previous work

    Online near-infrared analysis coupled with MWPLS and SiPLS models for the multi-ingredient and multi-phase extraction of licorice (Gancao)

    Get PDF
    Additional file 1. Table S1. The sampling intervals in different extraction phases. Table S2. The HPLC results of different indicators. Table S3. The evaluation parameters of PLS and SiPLS models

    Closed-loop Controlled Brillouin Optical Time-Domain Analysis

    Get PDF
    A closed-loop controlled BOTDA distributed optical fibre sensor is proposed for tracking fast temperature-strain evolution. The measurement time is reduced by two orders of magnitude with respect to classical BOTDA sensing, while keeping the same accuracy and measurement conditions

    Increasing robustness of bipolar pulse coding in Brillouin distributed fiber sensors

    Get PDF
    The robustness of bipolar pulse coding against pump depletion issues in Brillouin distributed fiber sensors is theoretically and experimentally investigated. The presented analysis points out that the effectiveness of bipolar coding in Brillouin sensing can be highly affected by the power unbalance between -1's and +1's elements resulting from depletion and amplification of coded pump pulses. In order to increase robustness against those detrimental effects and to alleviate the probe power limitation imposed by pump depletion, a technique using a three-tone probe is proposed. Experimental results demonstrate that this method allows increasing the probe power by more than 12.5 dB when compared to the existing single-probe tone implementation. This huge power increment, together with the 13.5 dB signal-to-noise enhancement provided by 512-bit bipolar Golay codes, has led to low-uncertainty measurements (< 0.9 MHz) of the local Brillouin peak gain frequency over a real remoteness of 100 km, using a 200 km-long fiber-loop and 2 m spatial resolution. The method is evaluated with a record figure-of-merit of 380'000
    • …
    corecore